A Note about the Vignettes

“A short story is confined to one mood, to which everything in the story pertains. Characters, setting, time, events, are all subject to the mood. And you can try more ephemeral, more fleeting things in a story - you can work more by suggestion - than in a novel. Less is resolved, more is suggested, perhaps.”

-Eudora Welty, Conversations with Eudora Welty, p 86

The following is a brief introduction of some good things to know, and then a one or two sentence summary of each vignette and the functions which are exemplified.

The vignettes contained within PreciseDist use biological data, but if that is meaningless to you, understand that PreciseDist will work with many kinds of data because a matrix is a matrix is a matrix. And, while one of PreciseDist’s main goals is to help you create a matrix that is optimal for the problem you are trying to solve, the functions we provide try to be data-source agnostic. Also, while we typically use the word distance throughout the vignettes, PreciseDist produces and works with similarities and correlations as well. What PreciseDist typically tries not to do, however, is make decisions on whether the input(s) is a similarity, correlation or distance, so try to stay aware of how the relationships in your data are being numerically defined before running various PreciseDist functions.

In addition, PreciseDist will always be a work in progress, so if while using the package anything is unclear, please do not hesitate to ask a question, suggest an improvement or point out a reproducible bug/flaw/inconsistency at the Github issues page. Lastly, if a scientific paper has led you to PreciseDist, please note that while the data used for examples in the vignettes may be ostensibly the same at times as the data in the paper, the exact code and methodology is not. If you wish to find a verbatim copy of the code used for the results in the paper, please click here.

A Parallel Future

This vignette explains how to use the parallel resources of your computer or cluster to run PreciseDist functions in parallel. Although it contains no PreciseDist functions, it is a simple yet crucial vignette to understand unless you are working with very limited amounts of data.

base::options()

doFuture::registerDoFuture()

future::plan()

Example Workflow

In many ways, this is the main vignette of the packages, and it introduces a methodology for using the PreciseDist framework to tackle the same problem in a variety of different ways. Or, as they say, if all roads lead to Rome there is more than one way to skin a cat.

data("data_cell_cycle")

precise_dist_list()

precise_dist()

precise_transform()

precise_correlations()

precise_umap()

precise_heatmap()

trellis_plots()

trellis_heatmap()

precise_fusion()

precise_graph()

precise_viz()

precise_func_fact()

A Similarity Graph of Distances

This is a very meta vignette that shows you how to make and then cluster a graph from a distance matrix of distances. This can be very helpful in deciding which distances one should combine to get a holistic view of either a single dataset or multiple datasets.

precise_correlations()

trellis_viz()

precise_cluster()

trellis_descriptors()

precise_transform() and it’s keep_string and filter_string parameters

Fighting Overfit with PreciseDist

A PreciseDist function may give you an answer that at times seems too good to be true, so this vignette shows you a few ways of trying to mitigate the false and hollow hope that overfit can endow.

precise_dist() and it’s partitions parameter

precise_transform() and it’s add_noise parameter

Clustering with PreciseDist

This vignette shows you several different ways you can cluster your results within the PreciseDist framework, and urges you to only trust the results that make sense and which are useful because we believe clustering is more about the journey than the destination.

precise_cluster()

trellis_descriptors()

precise_graph()

Cluster Validation with PreciseDist

In this vignette, we demonstrate a few different ways of determining if your clusters make sense and are useful.

trellis_descriptors() and both it’s diagnostics and rank parameters

precise_stats()

trellis_pivot()

All Available Views

PreciseDist provides a number of output visualization options for input graphs or matrices, so in this vignette we show them all at once.

trellis_viz()

Viewing Results with Gephi

Although PreciseDist provides many different types of visualizations, Gephi is a wonderful way to take the visualizations PreciseDist produces for further analysis. Also, as an aside, we show you here how to embed downloadable static images into your Rmarkdown page when trying to include them through local paths is making you confused and crazy.

precise_viz() and it’s graphml parameter

plotly_embed()

Clustering with Other Algorithms

While PreciseDist offers a clustering solution, the framework focuses considerably more on the before (the distance) and the after (the usefulness) of clusters than the clustering itself. So, this vignette is a code repository for other methods that can also take distances (or similarities or correlations) as input.

proxy::pr_dist2simil()

apcluster::apcluster()

apcluster::apclusterK()

SNFtool::spectralClustering()